Skip to content

fix(skills): cursor in CLI help, drop deprecated --force and stale judge vars in tutorial (#133)#148

Closed
Dongbumlee wants to merge 7 commits into
developfrom
fix/issue-133-copilot-skills-validation
Closed

fix(skills): cursor in CLI help, drop deprecated --force and stale judge vars in tutorial (#133)#148
Dongbumlee wants to merge 7 commits into
developfrom
fix/issue-133-copilot-skills-validation

Conversation

@Dongbumlee
Copy link
Copy Markdown
Collaborator

Closes #133.

Summary

Validated docs/tutorial-copilot-skills.md end-to-end. Three drifts fixed across one code file and one doc file.

Drifts fixed

# Issue Fix
1 CLI help for agentops skills install --platform listed only 'copilot, claude' but cursor is fully supported in services/skills.py (auto-detection at line 130, register function _register_cursor at line 598, layout entry at line 36). Tutorial line 66-68 already correctly mentions cursor — help text was the outlier. Updated cli/app.py:533 to list all three.
2 Tutorial Section 2 used agentops skills install --platform copilot --force. --force is now deprecated ('skills are always overwritten with the latest version' per help text). Dropped --force from the example.
3 Tutorial recommended setting AZURE_OPENAI_ENDPOINT + AZURE_OPENAI_DEPLOYMENT alongside AZURE_AI_MODEL_DEPLOYMENT_NAME. Stale after PR #141 — deployment-only override now works against the Foundry project endpoint. Simplified to AZURE_AI_MODEL_DEPLOYMENT_NAME only; added a one-liner noting when the AZURE_OPENAI_* pair is still required (separate judge resource).

Verification (clean /tmp)

Command Outcome
agentops skills install --platform copilot ✅ creates .github/copilot-instructions.md + 6 SKILL.md files (all 7 listed in tutorial)
agentops skills install --platform claude ✅ creates .claude/commands/agentops-*.md (6 files)
agentops skills install --platform cursor ✅ creates .github/skills/... + .cursor/rules/agentops.mdc
agentops skills install --platform copilot --force ✅ still works (back-compat); --help text correctly marks it deprecated
agentops agent analyze --help --severity-fail critical exists as documented

Tests

Full suite: 290 passed, 1 skipped (no test changes — code change is a 1-line help-text update; tutorial fixes are doc-only).

Note for reviewers

Branched off fix/issue-132-baseline-comparison-validation (PR #147). Once that merges, this PR's diff against develop reduces to the 3 fixes here.

Also: issue #131 (conversational agent) was parked with a detailed comment — AgentOps does not implement multi-turn / history / extra_fields cross-row state; the tutorial differentiator is absent. Resuming after the product call.

DB Lee added 7 commits May 12, 2026 09:25
Azure OpenAI's GPT-5 and o-series reasoning models reject the legacy
'max_tokens' parameter and require 'max_completion_tokens'. The
azure-ai-evaluation SDK only switches its built-in evaluators (Coherence,
Fluency, Similarity, etc.) to the new parameter when it is constructed
with is_reasoning_model=True.

This change auto-detects reasoning-model deployments by name and passes
is_reasoning_model=True to each evaluator's constructor, so users can
judge with gpt-5.x, o1, o3, or o4 deployments without manual config.

Detection pattern: deployment names starting with gpt-5, gpt5, o1, o3,
or o4 (case-insensitive). Override with the env var
AGENTOPS_EVALUATOR_REASONING_MODEL when an alias hides the real model
family.

Also documents the model-direct judge defaults: when only
AZURE_AI_FOUNDRY_PROJECT_ENDPOINT is set, the judge defaults to the
target deployment and the endpoint is derived from the Foundry project
URL.
Previously, switching the AI-assisted evaluator judge to a different
deployment required setting both AZURE_OPENAI_ENDPOINT and
AZURE_OPENAI_DEPLOYMENT - even when the judge lived in the same Foundry
project as the target. Users had to manually compute the classic Azure
OpenAI endpoint URL.

Now AZURE_OPENAI_DEPLOYMENT (or AZURE_AI_MODEL_DEPLOYMENT_NAME) alone is
sufficient when AZURE_AI_FOUNDRY_PROJECT_ENDPOINT is set: AgentOps
reuses the Foundry-derived data-plane endpoint. Users with a fully
separate Azure OpenAI judge resource still set both vars (unchanged).

Endpoint-only overrides remain rejected so AgentOps never silently
judges with the wrong deployment.

Also documents the deployment-name lookup tip (Foundry suffixes
deployment names with random IDs, e.g. gpt-4.1-443723) in the
model-direct tutorial.

Refs #126
…e prereqs

Drift surfaced while validating the minimal quickstart tutorial (#125):

- 'agentops init' now creates three files (added .gitignore at the
  project root) - doc previously said two.
- The evaluator override YAML example used plain strings, but the
  AgentOpsConfig schema requires '- name: <ClassName>' entries.
  Updated the snippet so users can copy/paste it without a
  ValidationError.
- AI-assisted evaluator prerequisites no longer mandate
  AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_DEPLOYMENT for Foundry
  targets - the judge defaults to the target deployment after
  PR #141. Marked the env vars as 'separate judge resource only'.

Refs #125
…orted legacy agent claim

Validated docs/tutorial-basic-foundry-agent.md end-to-end against a
freshly-created named agent (qa-bot:1) on the new Foundry experience
and corrected three drifts:

- Dataset path was '.agentops/data/smoke-agent-tools.jsonl', a file
  that 'agentops init' has never created. Use the actual seed at
  '.agentops/data/smoke.jsonl'.
- Judge model was sourced from AZURE_OPENAI_DEPLOYMENT, but Part 2
  never set it. After PR #141 the deployment-only override is enough,
  so Part 2 now exports AZURE_AI_MODEL_DEPLOYMENT_NAME and Part 3
  reflects that.
- Tutorial claimed 'AgentOps handles both [named and legacy asst_*]
  agents. Named agents use the Foundry Responses API; legacy agents
  use the Threads API.' Neither AgentOps nor the new Foundry Responses
  API supports asst_* legacy agents today: classify_agent() rejects
  the bare ID, and the Responses API requires a versioned name even
  for migrated assistants. Reframed the tutorial as 'named, versioned
  agents only', linked to the recreate-as-named-agent path, and
  tracked the gap in #143.

Refs #127
…ents only

Validated docs/tutorial-rag.md end-to-end against a Foundry named agent
(qa-bot:1) on the new Foundry experience: 5 rows x 9 evaluators ran
cleanly, all thresholds passed.

Three doc fixes informed by runtime validation:

- Part 1 step 3: the knowledge-base/file-search tool is correctly listed
  as a Foundry feature, but is **optional** for this tutorial. The
  evaluator only sees the agent's final answer, not its internal
  retrieval, so a plain prompt agent works equally well for the eval
  loop. Reworded as optional with a note about when production users
  would want it.
- Part 4: 'context' bullet now describes the field as 'reference
  passages' instead of 'retrieved document context' (more accurate -
  AgentOps does not capture runtime retrieval). Replaced the vague
  'populate with retrieved passages' tip with two concrete workflows
  (manual reference passages vs pre-script retrieval).
- Notes: added named-agents-only constraint, mirroring the
  basic-foundry-agent tutorial.

Refs #128. Capture-retrieval gap tracked in #145.
…gn doc

The baseline-comparison tutorial claimed the shipped PR workflow
'already supports' baseline comparison, but the generated
agentops-pr.yml ran 'agentops eval run --config <cfg>' with no
--baseline flag. Users had to manually edit the workflow.

Make the doc claim true: the PR workflow now auto-detects
.agentops/baseline/results.json and passes --baseline when present.
Without that file, behaviour is unchanged (no baseline, no comparison).

- src/agentops/templates/workflows/agentops-pr.yml: new shell guard in
  the 'Run AgentOps eval' step.
- docs/tutorial-baseline-comparison.md: section 4 now reflects the
  auto-detection (drop the file, no workflow edit needed).
- tests/unit/test_cicd.py: assert the generated workflow contains the
  baseline-detection block.

Verified end-to-end: ran the baseline comparison flow in /tmp and
confirmed results.json carries the documented top-level 'comparison'
block (baseline_path, baseline_started_at, baseline_overall_passed,
metrics[], rows[]).

Refs #132
…force and stale judge env vars from tutorial

Validated docs/tutorial-copilot-skills.md end-to-end:
`agentops skills install` produces all 7 documented files
(.github/copilot-instructions.md + 6 SKILL.md) for copilot, equivalent
.claude/commands/*.md for claude, and .cursor/rules/agentops.mdc for
cursor. Watchdog commands (agent analyze, agent serve) match the
tutorial.

Three drifts fixed:

- CLI help for 'agentops skills install --platform' listed
  'copilot, claude' but cursor is fully supported in
  services/skills.py (auto-detection, register function, layout).
  Updated app.py:533 help text.
- Tutorial Section 2 invoked 'agentops skills install --platform
  copilot --force'. --force is now deprecated ('skills are always
  overwritten with the latest version' per the help text); dropped
  it from the example.
- Tutorial Section 'Set local evaluator variables' recommended
  exporting AZURE_OPENAI_ENDPOINT + AZURE_OPENAI_DEPLOYMENT alongside
  AZURE_AI_MODEL_DEPLOYMENT_NAME. Stale after #141 (deployment-only
  override now works against the Foundry project endpoint).
  Simplified to AZURE_AI_MODEL_DEPLOYMENT_NAME only, with a note
  about when the AZURE_OPENAI_* pair is still needed (separate judge
  resource).

Refs #133.
@Dongbumlee
Copy link
Copy Markdown
Collaborator Author

Closing to re-run the validation against the current develop. develop has advanced ~10+ commits (including significant changes to runtime.py, tutorials, and CLI surface) since this PR was opened, and the architecture/text we built on is no longer the baseline.

Will reopen scoped, smaller PRs per backlog issue after re-running each tutorial against the latest develop.

Tracked in plan.md (session state).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant